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Abstract 



■ We discuss an efficient implementation of the iterative proportional scaling pro- 
cedure in the multivariate Gaussian graphical models. We show that the computa- 

qq | tional cost can be reduced by localization of the update procedure in each iterative 

■ step by using the structure of a decomposable model obtained by triangulation of 
the graph associated with the model. Some numerical experiments demonstrate the 

| competitive performance of the proposed algorithm. 



1 Introduction 



Since Dempster [6[ introduced a multivariate Gaussian graphical model, also called a co- 
variance selection model, it has been investigated by many authors from both theoretical 
and practical viewpoints. On the theory of a Gaussian graphical model, see e.g. Whit- 
taker 22 1 , Lauritzen 16], Cox and Wermuth ^ and Edwards 10]. In recent years much 



effort has been devoted to application of the Gaussian graphical model to identify sparse 



large network systems, especially genetic networks (e.g. [8[, [18J, [9|), and the efficient 
implementation of the inference in the model has been extensively studied. In this article 
we discuss an efficient algorithm to compute the maximum likelihood estimator (MLE) 
of the covariance matrix in the Gaussian graphical models. 

When the graph associated with the model is a chordal graph, the model is called 
a decomposable model. For a decomposable model, the MLE of the covariance matrix 
is explicitly obtained. For general graphical models other than decomposable models, 
however, we need some iterative procedure to obtain the MLE. The iterative proportional 
scaling (IPS) procedure is one of popular algorithms to compute the MLE. 



I 



The IPS was first introduced by Deming and Stephan \^ to estimate cell probabilities 
in contingency tables subject to certain fixed marginals. Its convergence and statistical 



properties have been well studied by many authors (e.g. [13|, [111 ] ) a nd the IPS have been 
justified in a more general framework ([3j). Speed and Kiiveri 21[ first formulated the 
IPS in a Gaussian graphical model and gave a proof of its convergence. 

However, from a practically point of view, a straightforward application of the IPS is 
often computationally too expensive for larger models. In the contingency tables several 
techniques have been developed to reduce both storage and computational time of the 
IPS (e.g. [Hi, Q). Badsberg and Malvestuto proposed a localized implementation of 
the IPS by using the structure of decomposable models containing the graphical model. 
Such a technique is called the chordal extension. The local computation based on the 
chordal extension has been a popular technique in many fields for numerical computation 
of a sparse linear system(e.g. 20], 12]). 



In the present paper we describe a localized algorithm based on the chordal extension 
for improving the computational efficiency of the IPS in the Gaussian graphical models. 
Let A be the set of variables which corresponds to the set of vertices of the graph associated 
with the model. The straightforward implementation of the IPS requires approximately 
0(| A] 3 ) time in each iterative step for large models. In the similar way to the technique 
in Badsberg and Malvestuto [lj], we localize the update procedure in each step by using 
the structure of a decomposable model containing the model. The proposed algorithm is 
shown to require 0(|A|) time for some models. 

The problem of computing the MLE is equivalent to the positive definite matrix com- 
pletion problem. The proposed algorithm based on the chordal extension is closely related 



to the technique discussed by Fukuda et al. [12J in the framework of the positive definite 
matrix completion problem but not the same. 

As pointed out in Dahl et al. (3] , the implementation of the IPS requires enumeration 
of all maximal cliques of the graph and this enumeration has an exponential complexity. 
Hence the application of the IPS to large models may be limited. However in the case 
where the model is relatively small or the structure of the model is simple, it may be 
feasible to enumerate maximal cliques. In this article we consider such situations. 

The organization of this paper is as follows. In Section 2 we summarize notations and 
basic facts on graphs and give a brief review of Gaussian graphical models and the IPS 
algorithm for covariance matrices. In Section 3 we propose an efficient implementation 
of the update procedure of the IPS. In Section 4 we perform some numerical experiments 
to illustrate the effectiveness of the proposed procedure. We end this paper with some 
concluding remarks in Section 5. 



2 Background and preliminaries 

2.1 Preliminaries on decompositions of graphs 

In this section we summarize some preliminary facts on decompositions of graphs needed 
in the argument of the following sections according to Leimer 17L Lauritzen 16( and 
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Malvestuto and Moscarini 19 . 



Let Q = (A, E) be an undirected graph, where A denotes the set of vertices and E 
denotes the set of edges. A subset of A which induces a complete subgraph is called a 
clique of Q . Define the set of maximal cliques of Q by C. For a subset of vertices V, let 
QiV) denote the subgraph of Q induced by V. When a graph Q is not connected, we 
can consider each connected component of Q separately. Therefore we only consider a 
connected graph from now on. 

A subset S C A is said to be a separator of Q if Q(A \ S) is disconnected. For a 
separator S, a triple (A, B, S) of disjoint subsets of A such that AUBUS = Ais said 
to form a decomposition of Q. A separator S is called a clique separator if S is a clique 
of Q. For two non-adjacent vertices 5 and 6', S C A is said to be a (<5, 5')-separator if 
5 £ A and 5' G B for a decomposition (A, B,S). A (5, <5')-separator which is minimal 
with respect to inclusion relation is called a minimal (5, 5')-separator or a minimal vertex 
separator (Lauritzen (l6| ) . Denote by S the set of minimal vertex separators for all non- 
adjacent pairs of vertices in Q. 

A graph Q is called reducible if A contains a clique separator and otherwise Q is 
said to be prime. If G(V) is prime and G(V) is reducible for all V with V C V' C 
A, QiV) is called a maximal prime subgraph (mp-subgraph) of Q. For any reducible 



graph, its decomposition into mp-subgraphs is uniquely defined ( Leimer 1 1 71] . Malvestuto 
and Moscarini [19|]). Denote by V the set of subsets of A which induces mp-subgraphs of 
G and let |V| = M. Then there exists a sequence V\, . . . , Vu £ V such that for every 
m — 2, . . . , M there exists m! < m with 

v m > ^ v m n (Vi u • • ■ u v m -i). 

Such a sequence is called a D-ordered sequence. Let S m := V m fl (Vi U ■ ■ • U V^-i) for 
m = 2, . . . , M. Define S = {S2, ■ ■ ■ , 5 m }. Denote by Sc the set of clique separators of Q. 
Then 5 satisfy S = S fl Sc- So we call elements of S clique minimal vertex separators. 



Leimer [17J showed that reducible graphs always have a D-ordered sequence with V\ — V 
for any V G V. Hence a D-ordered sequence is not uniquely defined. However S is 
common for all D-ordered sequences. 

Example 1 (A reducible graph). The graph Q in Figure^ is an example of reducible 
graphs. Q has two clique minimal vertex separators S 2 ■— {3,4} and S 3 := {5,6}. Define 
V\, Vi and V3 by 

Vx := {1, 2, 3, 4}, V 2 := {3, 4, 5, 6}, V 3 := {5, 6, 7, 8} 
as in FigureUl Then V = {Vi, V2, V3} and the sequence V\, Vi, V3 is a D-ordered sequence. 



When Q is a chordal graph, V and S are equal to the set of maximal cliques C and 
the set of minimal vertex separators S of Q, respectively. Hence \C\ = M. A D-ordered 
sequence for a chordal graph is called a perfect sequence of maximal cliques. There exists 
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Vi 



V2 



v 3 



Figure 1: A reducible graph with eight vertices 



a perfect sequence of maximal cliques C\, . . . , Cm such that C\ — G for any C G C(e.g. 
Lauritzen 161] ). 

For a vertex 5 G A, let adj(5) denote the set of vertices adjacent to 5. When adj(5) 
is a clique, 8 is called a simplicial vertex. A simplicial vertex is contained in only one 
maximal clique C. Hence if 5 is simplicial and 5 G C, then adj(5) = C \ {5}. A sequence 
of vertices 8\, 62, ■ ■ ■ , 5\a\ is called a perfect elimination order of vertices of Q if 5i is a 
simplicial vertex in ^UlwIA'})- ^ * s wen known that Q is a chordal graph if and only if 
Q possesses a perfect elimination order (Dirac[3]). Let C±, . . . , Cm be a perfect sequence of 
maximal cliques of a chordal graph Q. Define R\ := Ci \ S2, S m := C m fl (C\ U ■ • • U C m _i) 
and _R m := C m \ for m — 2, . . . , M. Let r m := \R m \. Let £"\ . . . , 5^ be any sequence 
of vertices in R m . Then the sequence of vertices 

;M rM rM-1 rM- 



*1 



rM A M_1 A 1 A 1 

°r M ' i ' ' ' ' ' nvf-i ' " ' ' ' 1 ' ' ' ' ' n 



is a perfect elimination order of Q. We call it a perfect elimination order induced by the 
perfect sequence C\, . . . ,Cm- 

We introduce some notations and a basic formula for matrices needed in the following 
sections. Let A = {a^} be a |A| x |A| matrix. For two subsets Ai and A 2 of A, we let 

^AiA2 = { a ij }ieAi,jeA2 

denote a |Ai| x | 1 submatrix of A. Define 

(A" )a 1 a 2 - 

We let [Aa!A 2 ] A denote the |A| x |A| matrix such that 



A" 1 
A1A2 



([^AiA 3 







if i G Ai,j G A 2 
otherwise . 



Let A2 = Af and decompose a symmetric matrix A into blocks as 
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A1A1 

AiA 2 



A AlA2 

^A 2 A 2 



Here for notational simplicity we displayed A for the case that the elements of Ai are 
smaller than those of A 2 . Suppose that Aa 2 a 2 an d AajAi — ^4aiA 2 ( j 4a2A 2 )~ 1 
both positive definite. Then A is positive definite and 



^AiA 2 are 



4" 1 

A1A1 



(^AiAi — ^4AiA 2 ( y 4A2A 2 ) ^A^) 
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2.2 Gaussian graphical models 

Let + denote the set of |A| x |A| positive definite matrices K = {kij} such that 
kij = for all i, j 6 A with i ^ j and (i, j) ^ E. Then the Gaussian graphical model 
for | A| dimensional random variable Y = (Y^\ . . . , y(l A D)' associated with a graph Q is 
defined as 

Y ~ N\a\(jjl,E), K := XT 1 G M + {Q). 

= indicates the conditional independence between Y"M an d V^) given all other 
variables. In what follows, we identify Ai + (Q) with the corresponding graphical model. 
Let yi, . . . , y n be i.i.d. samples from .M + ((?). Define ?/ and IV by 



t=l 



8=1 



respectively. The likelihood equation is written as 

L(/x, iiQ « (detiv exp j-itrW - ^triv (y - - //)'} . 

The MLE of fi is y. The likelihood equations involving K are expressed as 

nKc l c = nT.cc = W cc , VC e C. 



(2) 



For a subset of vertices V C A, let -Kyy denote the MLE of K in the marginal model 
associated with the graph Q(V) based on the data in V-marginal sample only. Let S 
be a clique separator of Q and (A, B,S) be a decomposition of Q. Let V = A U S and 
V = 5 U S. Then the MLE K is known to satisfy 







A 




A 


k = 


k vv 


+ 


Kyiyi 


— n 



[(^fi)- 1 ]' 



(3) 



(e.g. Lauritzen (16J). More generally, for the set of mp-subgraphs V and the set of clique 
minimal vertex separators S, 



(4) 



ses 



As mentioned in the previous section, when the model is decomposable, V = C and S = S. 
Hence from (j2J), K is explicitly written by 

# = n [(Wcc)- 1 } A ~nJ2 [(Wss)- 1 } A • 



cec 



ses 



However for other graphical models, we need some iterative procedure for computing 
the first term on the right-hand side of (Jlj). The following IPS is commonly used for this 
purpose. Note that the second term on the right-hand side of (jlj) needs to be calculated 
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only once and is not involved in the iterative procedure. IPS consists of iteratively and 
successively adjusting T^qc f° r C G C as in (fJD. Let K l and E* = {K 1 )' 1 denote the 
estimated K and E at the t-th step of iteration, respectively Define D := A\C for 
C G C. Then the t-th iterative step of the IPS is described by the update rule of K l as 
follows. 

Algorithm (Iterative proportional scaling for K) 

Step t <— 1 and select an initial estimate K° such that K° G 

Step 1 Select a maximal clique C G C and update K as follows, 

(K% c - {WccY 1 + {K t - 1 ) CD {{K t - 1 ) DD )-\K t - 1 ) DC (5) 

(jr*) CD - (K t ~ 1 )cD 



Step 2 If i^* converges, exit. Otherwise t t + 1 and go to Step 1. 
From (JTJ, it is easy to see that 

{K% l c = (S*) CG = WWn. 

In Step 1, only the C-marginal of K is updated. Therefore we note that if the initial 
estimate K° satisfies K° G M + {Q), K l satisfies K* G M + (G) for all t. By using the 
argument of Csiszar [3], the convergence of the algorithm to the MLE 

lim K l = K, lim E* = S 

n^oo n— too 



is guaranteed (Speed and Kiiveri [2l| and Lauritzen [16|). 

The fact ((3]) suggests that the decomposition (A, B, S) for a clique separator S can 
localize the problem, that is, in order to obtain the MLE K, it suffices to compute the 
MLE of submatrix Kvv and Kyv'i where V = A U S and V = B U S. Especially if 
the decomposition by mp-subgraphs is obtained, we need only to compute Kvv f° r eac h 
V G V. 

From a complexity theoretic point of view, the t-th iterative step §5$) requires 0(|D| 3 + 
| -D | 2 1 C | + |£)||C| 2 ) time. The graphical model with 

C = {{1,2},{2,3},...,{|A|-1,|A|},{|A|,1}} 

is called the | A | -dimensional cycle model or |A| cycle model. Note that the cycle is prime. 
In the case of |A| cycle model, \C\ = 2 and \D\ = |A| — 2. Hence when |A| > 4, the 
iterative step (jSJ) requires 0((|A| — 2) 3 ) time. In the next section we propose a more 
efficient algorithm for computing (J5J) by using the structure of a chordal extension of a 
graph. 
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3 A localized algorithm of IPS 



From (CEJ), we note that flSj) is rewritten as 

(K% c = {WccY 1 + (K'-^cc - ((S*- 1 )^)" 1 

= (Wcc)" 1 + (K^) cc - ((K^Ur 1 - (6) 

In this section we provide an efficient algorithm to compute ((K t ~ 1 )^)~ 1 by using the 
structure of Q. For a graph Q, let Q* be a chordal graph obtained by triangulating Q. 
Such Q* is called a chordal extension of Q. Figure [2] represents an example of the five 
cycle model and its chordal extension. 




(i) the five cycle model (ii) a chordal extension of (i) 
Figure 2: The five cycle model and its chordal extension 

Let C*, . . . , C* M be a perfect sequence of the maximal cliques of Q* with C* D C. Let 
Sm '■= Cm n (C* U ■ ■ • U C^_ x ) for m = 2, . . . , M be minimal vertex separators of £*. We 
propose the following algorithm to compute ((i^* -1 )^,^,) -1 for each maximal clique C E C. 

Algorithm 1 (Computing ((if* -1 )^)" 1 ). 

Step m <- M and K* <- K 1 ' 1 . 

Step 1 If m 1, select a simplicial vertex 5 G Cj^ of £/*. 
If m — 1, select a vertex 8 £ C. 
Let Q = C* \ {5}. 

Step 2 Update AT£ Q by 

Step 3 Update C^, Q* and A as follows, 

C* m ^Q, G*^g*(A\{5}), A^A\{6}. 
If = S^, m <— m - 1. 

If = C, return i^^c- Otherwise, go to Step 1. 
Now we state the main theorem of this paper. 
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Theorem 1. The output K* cc of Algorithm^ is equal to {{K l 1 . 

Proof. Let 5 G C* M be a simplicial vertex in Q*. Define Q := C* M \ {5}, Q\ := A \ {5} 
and g 2 := A\Q f . Since adj(5) C C* M and e M + (G), {K l ~ l ) Q2& = 0. Noting that 

Q U Q2 = Qi, we have from (pQ) 

= (-^ <_1 )qiQi - (k^)jl ( ) ( (K^) SQ ) 

= {k " 1)q ^ ~ ( (A^^M**- 1 )*? ) 



and ((K^qIqJ- 1 e M + (G(Q 1 )), where (k^ss is the (5, 5)-th element of K 1 ' 1 . By 
iterating the procedure in accordance with the perfect elimination order induced by the 



perfect sequence C*, . . . , C* Ml we complete the proof. 



□ 



In Algorithm CD, the triangulation Q* is arbitrary However for every iterative step of 
adjusting the C-marginal, we have to use the perfect sequence with C{ D C. 



Example 2 (the five cycle model). 
expressed by 



K 



( feu 

fcl2 
fcl3 



\ 



Consider the five cycle model in Figure® (i). K is 

\ 



ki2 


ki 3 








k 2 2 





k 24: 








^33 





h 5 


&24 





^44 


&45 





&35 


^45 


^55 



/ 



By adding the fill-in edges {2, 3} and {3, 4}, a triangulated graph Q* can be obtained as in 
Figure® (U)- Consider the case where C = {1,2}. Define C{ = {1,2,3}, C 2 * = {2,3,4} 
and C3 = {3,4,5}. Then the sequence C*, C%, C3 is perfect and it induces a perfect 
elimination order 5,4,3,2,1. The update of K* in step 2 in accordance with the perfect 
elimination order is described as follows, 



-^34,34 *~ -^34,34 (^55) 



-^23,23 *~ -^23,23 (^44J 



K 



12,12 



K* 

-"■12,12 



(k 



33 > 



ft 35 
ft 45 

^24 
^34 

ft 13 
^23 



(^35 ^45j) 

(/c 24 fc 34 ), 

(^13 ^23)' 



Then K* 



t-l\-\ \-l 
12,12 J 



t-i\-i \-i 
cc) ■ 



□ 



We now analyze the computational cost of the proposed algorithm. In Step 2, the 
running time of the calculation of ([7]) is as follows, 



S 



• K* :— (kgg^Kgg requires \Q\ divisions ; 

• K% := K*Kgn requires \Q\ 2 multiplications ; 

• Kqq — requires \Q\ 2 subtractions. 

Define R[ := C{ \ C and R* m := C* m \ S* m for m = 2, . . . , M. \Q\ ranges over {\C m \ - j \ 
1 < j < 1 — 771 — ^0- Let /i, 7 and a measure the time units required by a 

single multiplication, division and subtraction, respectively. Then the running time of 
Algorithm [1] amounts to 

M R* m M R* m 

(a* + E E (ra - + 5 E E era - j) 

m=l j'=l m=l j'=l 

M 

= (/i + a) Ejl-^mll^ml 2 _ l^mll^ml ~ l^mH^m! 
m=l 

\R* rn \(\R* m \ + l)(2\R; n \ + l) } 
6 J 

+ ^ E (i^iiqi - (1 + I ^ I)IKJ } + 

m=l ^ ^ 

Since |C*J > \R^\, the computational cost of Algorithm [1] is ($Zm=i I^Cll^ml 2 )- Once 

((A'* -1 )^) -1 is obtained, 0(|C| 2 ) additions are required to compute ([6]). Note that we 
can compute (Wcc) -1 once before the IPS procedure. Hence the computational cost of the 

t-th iterative step amounts to O (\C\ 2 + J2m=i \^*m\ l^ml 2 ) • m the case of cycle models, 
|C| = 2, M = |A| - 2, |C£| = 3 and = 1. Thus the computational cost is 0(|A|). 
As mentioned in the previous section, the direct computation of ((A'* -1 )^) -1 requires 
0((A\C) 3 ) = 0(|-D| 3 ) time and in the case of cycle models it requires 0((|A| — 2) 3 ) time. 
Hence we can see the efficiency of the proposed algorithm. 



4 Numerical experiments for cycle models 

In this section we compare the localized IPS proposed in the previous section with the 
direct computation of the IPS by numerical experiments. We consider the | A| cycle models 
with |A| = 5, 10,50, 100,200,300,500, 1000. We set A = J| A |. We generate 100 Wishart 
matrices W with the parameter Jia| and the degrees of freedom |A| and computed the 
MLE A for | A| cycle models by using the proposed algorithm and the direct computation 
of the IPS. We set the initial estimate A := L&\- As a convergence criterion, we used 
Ylij — 10 -6 . The computation was done on a Intel Core 2 Duo 3.0 GHz CPU machine 
by using R language. Table [1] presents the average CPU time per one iterative step to 
update A*" 1 in ([6]) for both algorithms. 
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We can see the competitive performance of the proposed algorithm when |A| = 5 
and |A| > 200. However the direct computation is faster than the proposed one for 
|A| = 10, 50, 100. In the update procedure of direct computation ([6]), the computation 
of ((iT* -1 )!)!)) -1 is the most computationally expensive and in theory it requires 0(|-D| 3 ) 
time. Table [2] shows the average CPU time for computing a |A| x |A| inverse matrix by 
using R language on the same machine. As seen from the table, while the CPU time for 
computing the inverse of a matrix increases nearly at the rate 0(|A| 3 ) for |A| > 100, it 
increases too slowly for relatively small |A|. On the other hand, we can see from Table 
[T] that the CPU time of the proposed algorithm almost linearly increases in proportion 
to |A| which follows the theoretical result in the previous section. These are the reasons 
why the proposed algorithm is slower than the direct computation for relatively small 
|A|. When |A| > 200, however, the computational cost of )dd) ~ 1 is not ignorable 

and the proposed algorithm shows a considerable reduction of computational time. In 
practice the performances for large models are more crucial. In this sense the proposed 
algorithm is considered to be efficient. 



Table 1: CPU time per one iterative procedure for |A| cycle models 
|A| Algorithm [1] direct computation 



5 


1.590 


2.261 


10 


3.442 


2.523 


50 


18.63 


5.482 


100 


38.11 


17.48 


200 


78.87 


104.91 


300 


120.03 


361.01 


500 


225.94 


1292.1 


1000 


511.60 


6625.8 


(10~ 2 CPU time) 



Table 2: CPU time for calculating |A| x |A| inverse matrices 



|A| 


CPU time 


5 


0.027 


10 


0.031 


50 


0.058 


100 


0.286 


200 


1.789 


300 


5.602 


500 


24.666 


1000 


207.66 


(10" 2 CPU time) 
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5 Concluding remarks 



In this article we discussed the localization to reduce the computational burden of the IPS 
in two ways. We first showed that the decomposition into mp-subgraphs of the graph can 
localize the IPS. Next we proposed a localized algorithm of the iterative step in the IPS by 
using the structure of a chordal extension of the graphical model for each mp-subgraph. 
The proposed algorithm costs 0(|A|) in the case of cycle models and some numerical 
experiments confirmed the theory for large models. 

As mentioned in Section 1, the implementation of the IPS requires enumeration of all 
maximal cliques of the graph and this enumeration has an exponential complexity. In 
addition, the proposed algorithm also requires some characteristics of graphs, that is, a 
chordal extension, perfect sequences and perfect elimination orders of the chordal exten- 
sion. In this sense, the application of the IPS may be limited. However in the case where 
the structure of the model is simple or sparse, it may be feasible to obtain characteristics 
of graphs. In such cases, the proposed algorithm is considered to be effective. 
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